Overview

Dataset statistics

Number of variables8
Number of observations785
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory44.5 KiB
Average record size in memory58.1 B

Variable types

Categorical1
Numeric7

Warnings

numcol is highly correlated with totalprod and 2 other fieldsHigh correlation
totalprod is highly correlated with numcol and 2 other fieldsHigh correlation
stocks is highly correlated with numcol and 2 other fieldsHigh correlation
prodvalue is highly correlated with numcol and 2 other fieldsHigh correlation
numcol is highly correlated with totalprod and 2 other fieldsHigh correlation
yieldpercol is highly correlated with totalprodHigh correlation
totalprod is highly correlated with numcol and 3 other fieldsHigh correlation
stocks is highly correlated with numcol and 2 other fieldsHigh correlation
prodvalue is highly correlated with numcol and 2 other fieldsHigh correlation
numcol is highly correlated with totalprod and 2 other fieldsHigh correlation
totalprod is highly correlated with numcol and 2 other fieldsHigh correlation
stocks is highly correlated with numcol and 2 other fieldsHigh correlation
prodvalue is highly correlated with numcol and 2 other fieldsHigh correlation
totalprod is highly correlated with prodvalue and 4 other fieldsHigh correlation
prodvalue is highly correlated with totalprod and 3 other fieldsHigh correlation
state is highly correlated with totalprod and 4 other fieldsHigh correlation
yieldpercol is highly correlated with totalprod and 2 other fieldsHigh correlation
numcol is highly correlated with totalprod and 3 other fieldsHigh correlation
priceperlb is highly correlated with yearHigh correlation
year is highly correlated with priceperlbHigh correlation
stocks is highly correlated with totalprod and 4 other fieldsHigh correlation

Reproduction

Analysis started2021-06-12 17:17:41.960933
Analysis finished2021-06-12 17:17:52.089456
Duration10.13 seconds
Software versionpandas-profiling v3.0.0
Download configurationconfig.json

Variables

state
Categorical

HIGH CORRELATION

Distinct44
Distinct (%)5.6%
Missing0
Missing (%)0.0%
Memory size6.3 KiB
Indiana
 
19
Montana
 
19
Pennsylvania
 
19
North Dakota
 
19
South Dakota
 
19
Other values (39)
690 

Length

Max length14
Median length8
Mean length8.080254777
Min length4

Characters and Unicode

Total characters6343
Distinct characters45
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowAlabama
2nd rowArizona
3rd rowArkansas
4th rowCalifornia
5th rowColorado

Common Values

ValueCountFrequency (%)
Indiana19
 
2.4%
Montana19
 
2.4%
Pennsylvania19
 
2.4%
North Dakota19
 
2.4%
South Dakota19
 
2.4%
Utah19
 
2.4%
Missouri19
 
2.4%
Michigan19
 
2.4%
Mississippi19
 
2.4%
Illinois19
 
2.4%
Other values (34)595
75.8%

Length

2021-06-12T13:17:52.490087image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
new53
 
5.8%
virginia38
 
4.1%
north38
 
4.1%
dakota38
 
4.1%
carolina25
 
2.7%
south25
 
2.7%
florida19
 
2.1%
arkansas19
 
2.1%
west19
 
2.1%
pennsylvania19
 
2.1%
Other values (35)627
68.2%

Most occurring characters

ValueCountFrequency (%)
a837
13.2%
i686
 
10.8%
n582
 
9.2%
o546
 
8.6%
s437
 
6.9%
e383
 
6.0%
r335
 
5.3%
t234
 
3.7%
l170
 
2.7%
h164
 
2.6%
Other values (35)1969
31.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter5288
83.4%
Uppercase Letter920
 
14.5%
Space Separator135
 
2.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a837
15.8%
i686
13.0%
n582
11.0%
o546
10.3%
s437
8.3%
e383
7.2%
r335
 
6.3%
t234
 
4.4%
l170
 
3.2%
h164
 
3.1%
Other values (14)914
17.3%
Uppercase Letter
ValueCountFrequency (%)
M135
14.7%
N121
13.2%
I76
 
8.3%
W76
 
8.3%
C63
 
6.8%
A57
 
6.2%
V57
 
6.2%
O44
 
4.8%
K38
 
4.1%
D38
 
4.1%
Other values (10)215
23.4%
Space Separator
ValueCountFrequency (%)
135
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin6208
97.9%
Common135
 
2.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
a837
13.5%
i686
 
11.1%
n582
 
9.4%
o546
 
8.8%
s437
 
7.0%
e383
 
6.2%
r335
 
5.4%
t234
 
3.8%
l170
 
2.7%
h164
 
2.6%
Other values (34)1834
29.5%
Common
ValueCountFrequency (%)
135
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII6343
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a837
13.2%
i686
 
10.8%
n582
 
9.2%
o546
 
8.6%
s437
 
6.9%
e383
 
6.0%
r335
 
5.3%
t234
 
3.7%
l170
 
2.7%
h164
 
2.6%
Other values (35)1969
31.0%

numcol
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct164
Distinct (%)20.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean61686.6242
Minimum2000
Maximum510000
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.3 KiB
2021-06-12T13:17:52.683500image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum2000
5-th percentile5000
Q19000
median26000
Q365000
95-th percentile269000
Maximum510000
Range508000
Interquartile range (IQR)56000

Descriptive statistics

Standard deviation92748.94046
Coefficient of variation (CV)1.50355027
Kurtosis7.675632202
Mean61686.6242
Median Absolute Deviation (MAD)19000
Skewness2.724034252
Sum48424000
Variance8602365956
MonotonicityNot monotonic
2021-06-12T13:17:52.888500image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
700051
 
6.5%
600037
 
4.7%
800035
 
4.5%
900030
 
3.8%
500027
 
3.4%
1000026
 
3.3%
1100020
 
2.5%
1400018
 
2.3%
400017
 
2.2%
1200015
 
1.9%
Other values (154)509
64.8%
ValueCountFrequency (%)
20001
 
0.1%
30008
 
1.0%
400017
 
2.2%
500027
3.4%
600037
4.7%
700051
6.5%
800035
4.5%
900030
3.8%
1000026
3.3%
1100020
 
2.5%
ValueCountFrequency (%)
5100001
 
0.1%
4900002
0.3%
4850001
 
0.1%
4800003
0.4%
4700001
 
0.1%
4650001
 
0.1%
4600002
0.3%
4500002
0.3%
4400001
 
0.1%
4200001
 
0.1%

yieldpercol
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION

Distinct98
Distinct (%)12.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean60.57834395
Minimum19
Maximum136
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.3 KiB
2021-06-12T13:17:53.076899image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum19
5-th percentile34
Q146
median58
Q372
95-th percentile96
Maximum136
Range117
Interquartile range (IQR)26

Descriptive statistics

Standard deviation19.4278306
Coefficient of variation (CV)0.3207058717
Kurtosis0.5848019806
Mean60.57834395
Median Absolute Deviation (MAD)12
Skewness0.7464778953
Sum47554
Variance377.4406018
MonotonicityNot monotonic
2021-06-12T13:17:53.291168image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
5025
 
3.2%
4624
 
3.1%
5123
 
2.9%
6021
 
2.7%
4821
 
2.7%
5220
 
2.5%
6618
 
2.3%
5518
 
2.3%
7017
 
2.2%
6117
 
2.2%
Other values (88)581
74.0%
ValueCountFrequency (%)
191
 
0.1%
201
 
0.1%
211
 
0.1%
221
 
0.1%
231
 
0.1%
263
0.4%
274
0.5%
282
0.3%
291
 
0.1%
304
0.5%
ValueCountFrequency (%)
1361
0.1%
1311
0.1%
1281
0.1%
1241
0.1%
1221
0.1%
1211
0.1%
1182
0.3%
1161
0.1%
1152
0.3%
1142
0.3%

totalprod
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct625
Distinct (%)79.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4140956.688
Minimum84000
Maximum46410000
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.3 KiB
2021-06-12T13:17:53.488365image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum84000
5-th percentile231600
Q1470000
median1500000
Q34096000
95-th percentile19856000
Maximum46410000
Range46326000
Interquartile range (IQR)3626000

Descriptive statistics

Standard deviation6884593.859
Coefficient of variation (CV)1.662561185
Kurtosis9.657821906
Mean4140956.688
Median Absolute Deviation (MAD)1176000
Skewness2.991733525
Sum3250651000
Variance4.73976326 × 1013
MonotonicityNot monotonic
2021-06-12T13:17:53.680068image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2800006
 
0.8%
4080006
 
0.8%
3360005
 
0.6%
2880005
 
0.6%
3240005
 
0.6%
2760005
 
0.6%
2600004
 
0.5%
7700004
 
0.5%
3300004
 
0.5%
3850004
 
0.5%
Other values (615)737
93.9%
ValueCountFrequency (%)
840001
0.1%
1200001
0.1%
1230001
0.1%
1360001
0.1%
1380001
0.1%
1410001
0.1%
1500002
0.3%
1530001
0.1%
1560002
0.3%
1590001
0.1%
ValueCountFrequency (%)
464100001
0.1%
421400001
0.1%
378300001
0.1%
373500001
0.1%
362600001
0.1%
360000001
0.1%
346500001
0.1%
345000001
0.1%
336700001
0.1%
331200002
0.3%

stocks
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct584
Distinct (%)74.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1257629.299
Minimum8000
Maximum13800000
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.3 KiB
2021-06-12T13:17:53.878065image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum8000
5-th percentile41200
Q1119000
median391000
Q31380000
95-th percentile5938800
Maximum13800000
Range13792000
Interquartile range (IQR)1261000

Descriptive statistics

Standard deviation2211793.817
Coefficient of variation (CV)1.758700929
Kurtosis11.70453917
Mean1257629.299
Median Absolute Deviation (MAD)332000
Skewness3.275719046
Sum987239000
Variance4.892031889 × 1012
MonotonicityNot monotonic
2021-06-12T13:17:54.069064image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
920006
 
0.8%
690006
 
0.8%
950005
 
0.6%
1040005
 
0.6%
1520004
 
0.5%
1510004
 
0.5%
1890004
 
0.5%
520004
 
0.5%
860004
 
0.5%
1060004
 
0.5%
Other values (574)739
94.1%
ValueCountFrequency (%)
80001
 
0.1%
120002
0.3%
130001
 
0.1%
140002
0.3%
170002
0.3%
190001
 
0.1%
210003
0.4%
230001
 
0.1%
240001
 
0.1%
250001
 
0.1%
ValueCountFrequency (%)
138000001
0.1%
135450001
0.1%
130460001
0.1%
129950001
0.1%
127960001
0.1%
123260001
0.1%
122200001
0.1%
121270001
0.1%
119700001
0.1%
118180001
0.1%

priceperlb
Real number (ℝ≥0)

HIGH CORRELATION

Distinct273
Distinct (%)34.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.695159236
Minimum0.49
Maximum7.09
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.3 KiB
2021-06-12T13:17:54.331377image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum0.49
5-th percentile0.62
Q11.05
median1.48
Q32.04
95-th percentile3.658
Maximum7.09
Range6.6
Interquartile range (IQR)0.99

Descriptive statistics

Standard deviation0.930623423
Coefficient of variation (CV)0.5489887932
Kurtosis3.437928907
Mean1.695159236
Median Absolute Deviation (MAD)0.51
Skewness1.568036977
Sum1330.7
Variance0.8660599555
MonotonicityNot monotonic
2021-06-12T13:17:54.535377image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1.4110
 
1.3%
1.4210
 
1.3%
1.49
 
1.1%
0.599
 
1.1%
0.659
 
1.1%
0.729
 
1.1%
0.688
 
1.0%
1.968
 
1.0%
0.648
 
1.0%
1.438
 
1.0%
Other values (263)697
88.8%
ValueCountFrequency (%)
0.491
 
0.1%
0.522
 
0.3%
0.532
 
0.3%
0.542
 
0.3%
0.552
 
0.3%
0.561
 
0.1%
0.575
0.6%
0.583
 
0.4%
0.599
1.1%
0.67
0.9%
ValueCountFrequency (%)
7.091
0.1%
5.851
0.1%
5.531
0.1%
5.431
0.1%
5.421
0.1%
4.991
0.1%
4.891
0.1%
4.881
0.1%
4.781
0.1%
4.681
0.1%

prodvalue
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct733
Distinct (%)93.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5489738.854
Minimum162000
Maximum83859000
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.3 KiB
2021-06-12T13:17:54.741140image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum162000
5-th percentile377000
Q1901000
median2112000
Q35559000
95-th percentile23121200
Maximum83859000
Range83697000
Interquartile range (IQR)4658000

Descriptive statistics

Standard deviation9425393.878
Coefficient of variation (CV)1.716911155
Kurtosis20.34189798
Mean5489738.854
Median Absolute Deviation (MAD)1469000
Skewness3.960801547
Sum4309445000
Variance8.883804976 × 1013
MonotonicityNot monotonic
2021-06-12T13:17:54.938678image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
7360003
 
0.4%
21220003
 
0.4%
4400003
 
0.4%
6510003
 
0.4%
5900003
 
0.4%
2590003
 
0.4%
25520002
 
0.3%
13610002
 
0.3%
8450002
 
0.3%
12560002
 
0.3%
Other values (723)759
96.7%
ValueCountFrequency (%)
1620001
0.1%
1730001
0.1%
1740001
0.1%
1790001
0.1%
1860001
0.1%
2100001
0.1%
2210001
0.1%
2350001
0.1%
2380001
0.1%
2490001
0.1%
ValueCountFrequency (%)
838590001
0.1%
699860001
0.1%
696150001
0.1%
675650001
0.1%
652680001
0.1%
635900001
0.1%
545420001
0.1%
506690001
0.1%
489600001
0.1%
478170001
0.1%

year
Real number (ℝ≥0)

HIGH CORRELATION

Distinct19
Distinct (%)2.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2006.817834
Minimum1998
Maximum2016
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.3 KiB
2021-06-12T13:17:55.115662image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum1998
5-th percentile1998
Q12002
median2007
Q32012
95-th percentile2015.8
Maximum2016
Range18
Interquartile range (IQR)10

Descriptive statistics

Standard deviation5.491522957
Coefficient of variation (CV)0.002736433204
Kurtosis-1.212078911
Mean2006.817834
Median Absolute Deviation (MAD)5
Skewness0.04866639741
Sum1575352
Variance30.15682439
MonotonicityIncreasing
2021-06-12T13:17:55.269662image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=19)
ValueCountFrequency (%)
200144
 
5.6%
200244
 
5.6%
200344
 
5.6%
199843
 
5.5%
200043
 
5.5%
199943
 
5.5%
200641
 
5.2%
200841
 
5.2%
200741
 
5.2%
200541
 
5.2%
Other values (9)360
45.9%
ValueCountFrequency (%)
199843
5.5%
199943
5.5%
200043
5.5%
200144
5.6%
200244
5.6%
200344
5.6%
200441
5.2%
200541
5.2%
200641
5.2%
200741
5.2%
ValueCountFrequency (%)
201640
5.1%
201540
5.1%
201440
5.1%
201339
5.0%
201240
5.1%
201140
5.1%
201040
5.1%
200940
5.1%
200841
5.2%
200741
5.2%

Interactions

2021-06-12T13:17:42.696932image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-12T13:17:42.882053image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-12T13:17:43.085050image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-12T13:17:43.326231image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-12T13:17:43.536234image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-12T13:17:43.732234image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-12T13:17:43.898247image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-12T13:17:44.075242image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-12T13:17:44.245240image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-12T13:17:44.417610image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-12T13:17:44.583679image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-12T13:17:44.824559image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-12T13:17:45.004195image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-12T13:17:45.181223image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-12T13:17:45.370219image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-12T13:17:45.526160image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-12T13:17:45.690734image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-12T13:17:45.865733image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-12T13:17:46.040735image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-12T13:17:46.206732image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-12T13:17:46.366736image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-12T13:17:46.530585image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-12T13:17:46.713585image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-12T13:17:46.905588image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-12T13:17:47.070587image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-12T13:17:47.261584image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-12T13:17:47.455583image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-12T13:17:47.626364image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-12T13:17:47.824960image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-12T13:17:48.026863image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-12T13:17:48.207042image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-12T13:17:48.376040image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-12T13:17:48.555038image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-12T13:17:48.734053image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-12T13:17:48.894536image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-12T13:17:49.077534image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-12T13:17:49.234313image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-12T13:17:49.398840image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-12T13:17:49.553102image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-12T13:17:49.720098image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-12T13:17:49.889821image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-12T13:17:50.049820image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-12T13:17:50.230820image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-12T13:17:50.423817image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-12T13:17:50.608590image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-12T13:17:50.782608image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-12T13:17:50.985645image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-12T13:17:51.181659image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-06-12T13:17:51.377192image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Correlations

2021-06-12T13:17:55.427661image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-06-12T13:17:55.618499image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-06-12T13:17:55.811055image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-06-12T13:17:56.029762image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2021-06-12T13:17:51.732661image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
A simple visualization of nullity by column.
2021-06-12T13:17:51.989156image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

statenumcolyieldpercoltotalprodstockspriceperlbprodvalueyear
0Alabama16000.000711136000.000159000.0000.720818000.0001998
1Arizona55000.000603300000.0001485000.0000.6402112000.0001998
2Arkansas53000.000653445000.0001688000.0000.5902033000.0001998
3California450000.0008337350000.00012326000.0000.62023157000.0001998
4Colorado27000.000721944000.0001594000.0000.7001361000.0001998
5Florida230000.0009822540000.0004508000.0000.64014426000.0001998
6Georgia75000.000564200000.000307000.0000.6902898000.0001998
7Hawaii8000.000118944000.00066000.0000.770727000.0001998
8Idaho120000.000506000000.0002220000.0000.6503900000.0001998
9Illinois9000.00071639000.000204000.0001.190760000.0001998

Last rows

statenumcolyieldpercoltotalprodstockspriceperlbprodvalueyear
775South Dakota280000.0007119880000.00012127000.0001.76034989000.0002016
776Tennessee6000.00055330000.00069000.0004.8801610000.0002016
777Texas133000.000709310000.0002607000.0002.08019365000.0002016
778Utah31000.00032992000.000169000.0001.9301915000.0002016
779Vermont6000.00052312000.00069000.0003.6401136000.0002016
780Virginia5000.00038190000.00030000.0005.8501112000.0002016
781Washington84000.000352940000.000412000.0001.9905851000.0002016
782West Virginia5000.00032160000.00043000.0003.920627000.0002016
783Wisconsin54000.000623348000.0001205000.0002.6708939000.0002016
784Wyoming40000.000682720000.000190000.0001.7804842000.0002016